Europaisches 
Patentamt 



European 
Patent Office 



Office eUTTVpeen 
des brevets 



;T/IB 03 /C 2,2 99 
.27.05.03 




Beschei n ig u ng Certificate 



Attestation 



Die angehefteten Unterla- 
gen stimmen mit der 
ursprOnglich eingereichten 
Fassung der auf dem nSch- 
sten Blatt bezeichneten 
europSischen Patentanme!- 
dung Qberein. 



The attached documents 
are exact copies of the 
European patent application 
described on the following 
page, as originally filed. 



Les documents fixds k 
cette attestation sont 
confbrmes a la version 
initialement dSpos6e de 
la demande de brevet 
europ§en spScifite a la 
page suivante. 



Patentanmeldung Nr. Patent application No. Demande de brevet 

02077421.2 



PRIORITY 
DOCUMENT 

SUBMITTED OR TOANSWTTED m 

«SSiiNCE wriHRUUE n.K.) OR (b) 



Der Prasident des Europdischen Patentamts; 
Im Auftrag 

For the President of the European Patent Office 
Le President de roffice europeen des brevets 

P.O. 



R C van Dijk 



BEST AVAILABLE COPV 



EPA/EPO/OEB Form 1014.1 - 02.2000 7001014 



Europaisches 
Patentamt 



European 
Patent 



Office europeen 
des brevets 



Anmeldung Nr: 
Application no.: 
Demande no: 



02077421.2 



Anmeldetag: 
Date of filing: 
Date de d^pdt: 



19.06.02 



Anmel der/Appl i cant( s )/Demandeur( s) : 

Koninklijke Philips Electronics N.V- 
Groenewoudseweg 1 
5621 BA Eindhoven 
PAYS-BAS 



Bezeichnung der Erf indung/Title of the invention/Ti tre de T invention: 
(Falls die Bezeichnung der Erfindung nicht angegeben ist, siehe Beschreibung. 
If no title is shown please refer to the description. 
Si aucun titre n'est indiqufi se referer a la description.) 

Audio signal processing apparatus 

In Anspruch genommene Prioriat(en) / Priori tyCies) claimed /Priorities) 
revendiquie(s) 

Staat/Tag/Aktenzeichen/State/Date/File no./Pays/Date/Numiro de d6p6t: 



Internationale Patentklassifikati on/International Patent Classification/ 
Classification international e des brevei^s: 

H04H/ 



Am Anmeldetag benannte Vertragstaaten/Contracting states designated at date of 
filing/Etats contractants disignies lors du dipdt: 

AT BE CH CY DE DK ES FX FR GB 6R IE IT LI LU MC NL PT SE TR 



02077421.2 
EPA/EPO/OEB Form 1014.2 - 01.2000 



7001014 



2 



10 



PHNL020527EPP 



Audio signal processing apparatus 



18.06.2002 



^ ^' 06.. 2002 



The mvention relates to an audio signal processing apparatus comprising an 
audio input for obtaining an entered audio signal, an audio output for outputting an outgoing 
audio signal, and a processor for performing a transformation to improve the intelligibility of 
speech present in tilie entered audio signal. 

The invention also relates to a television receiver comprising such an audio 
signal processix^ apparatus. 

The invention also relates to a radio program receiver comprising such an 
audio signal processing apparatus. 

The invention also relates to method for inareasing the intelligibility of an 
audio signal comprising 

a first step of obtaiiung an entered audio signal 

a second step of transforming the entered audio signal into an outgoing audio 



signal 



a third step of outputting the outgoing audio signal. 



15 



An apparatus for improving the intelligibility of speech in a television receiver 
is known from US-B-6, 226,605. This patent describes the application of speech 
intelligibility algorithms known from a hearing aid in a television receiver. One of the 
20 algorithms in the known apparatus reproduces the speech at a lower speed by increaising the 
duration of silent periods between spoken words. It is a drawback of the known apparatus 
that the algorithms are designed to improve the intelligibility of speech for a particular 
person^ but the algorithms do not take into accotmt any specijSc non person related fectors 
that influence the intelligibilily of speech in an audio signal. 

25 

It is a first object of the invention to provide an apparatus of the kind described 
in the opening paragraph, which can improve the intelligibility of speech in a better way. 
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It is a second object of the invention to provide a television receiver of the 
kind described in the opening paragrs^jh, which has means to enhance the intelli^bility of 
speech present in the incommg television signal in a better way than known. 
Tt a third obj ect of the invention to provide a radio program receiver of the 



5 kind described in the opening paragraph, which has means to enhance the intelligibility of 
speech present in the incoming radio signal in a better way than known. 

It is a fotirth object of the invention to provide a method for transforming an 
audio signal of the kind described in the opening paragraph, to enhance the intelligibility of 
^eech presait in the audio signal in a better way than known. 
1 0 The first object is realized in that the processor has a noise level value and has 

ihe ability to transform the entered audio signal into the outgoing audio signal by the 
transformation modeling at least one aspect of the Lombard effect, based upon the noise level 
value. The Lombard effect, or Lombard reflex, is a term indicating Ae changes of human 
speech when a speaker speaks in an environment with noise. Human speech is not always the 
1 5 same. A first class of speech changes are intended changes withm a certain mode of speech. 
E.g. a speaker can emphasize a word. A second class of speech changes are intended or 
unmtended changes to a different speech mode. E.g. speech characteristics change when a 
speaker is tired, speaks in a vibrating environment or speaks in a noisy environment. Some of 
the characteristics of the audio signal that change ftom normal to Lombard speech are e.g. 
20 signal volume, word length and pitch. Speech improvement can be applied to any audio 
signal, but is only usefiil v/h&n. the audio signal contains some speech. The transformation 
accordmg to the invention can provide a f aithfiil speech intelligibiUty improvement which 
accurately models the changes from normal q)eech to Lombard speech, in which case one 
needs an accurate characterization of noise inducing the Lombard speech mode. This faithful 
25 transformation can eitiier reproduce Lombard speech as a human utters it, or even improve 
the intelligibility of speech more than a human. Alternatively tiie transformation can 
approximate the Lombard effect, in which case it improves the speech intelligibilily 
suboptimally, based on a less accurate noise level value. 

A rather trivial transformation, solely increasing the audio signal volume 
30 depending on ambient noise exists in the prior art US-A-5,907,622 discloses an audio signal 
processing system whidi changes the audio signal volume based upon an ambient noise 
measurement, but performs no more advanced operatioDS \v^ch further improve in a higher 

quality way fEelntelligibility of speech in tEe audio signal. TBelomeMparprocessii^ 

ap paratus according to t he invention implements at least one aspect of the Lombard effect 
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more complex than a simple signal volume adjustment, which is known in audio processing. 
Most of tiie aspects of the Lombard effect belong to the field of speech processing rather than 
to the field of audio signal processing. The audio signal processing apparatus according to the 
invention may also perform an additional signal volume adjustment, but this is not the gist of 
5 the invention. 

In an embodiment of the audio signal processing apparatus of the invention, a 
microphone and a noise value extractor are present for providing the noise level value to the 
processor, j&om noise in the environment where tiie outgoing audio signal is reproduced. 
With this embodiment the apparatus can improve the intelligibility of the entered audio signal 

10 when noise is present in the environment of the audio signal processing apparatus. The 
entered audio signal may akeady have been improved e.g. ixx a broadcasting studio, taking 
into account noise present during recording. A broadcaster has no way of knowing what 
noises occur during reproduction of the outgoing audio signal, and hence improvement has to 
occur in the audio signal processing apparatus. To measure the noise, of the environment of 

IS tiie audio signal processing apparatus a microphone picks up sounds in this environment The 
noise value extractor connected to the microphone generates a noise level value firom an 
entered electrical audio signal coming from the microphone and entering the noise value 
extractor. Because in general the audio signal processing apparatus is connected to a i 
loudspeaker for reproducing the outgoing audio signal, the microphone picks up the soxmd [y 

20 generated firom the outgoing audio signal as weU as other noise soimds present in the 
environment of the audio signal processing apparatus. Preferably, the transformation 
improves the intelligibility of speech depending on the noise level value derived j&om the 
other noise soimds solely, and not from the sound generated from the outgoing audio signal. 
To realize this, an adaptive echo cancellation algorithm may be present in the noise value 

25 extractor to diminish the contribution of the sound generated from the outgoii^ audio signal 
so that the noise level value is predoiiimantiy dependent on the other noise sounds in the 
enviroiraient. 

It is advantageous if a noise value characterizer is present, for retrieving the 
noise level value from the entered audio signal. In some broadcasts, e.g. a report on site, e.g. 
30 in a street, there is background noise present in the entered audio signal. A speaker may 

already apply the Lombard effect to compensate for this background noise, but the nuisance 
of the noise as perceived by the speaker is not necessarily equal to the nuisance in an audio 
signal picked up by a microphone. Furthermore there is more noise added to the signal during 
broadcasting and transmission, e.g. due to compression or other audio signal transformations. 
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It is heaice desirable that a noise measuiement can be done of the noise present in the entered 
audio signal at the receiver side, to unprove the mteUigibiUly of the speech present in the 
entered audio signal. Similar embo<Kments as embodiments of the audio signal processing 

apparatiisjnsf d ^^tliP. rPPMypT s ide can be used at the broadcaster side, to improve the 

5 intelligibility of speech in the same way for all receivers. 

It is advantageous if a selection input is present for setting the noise level 
value to a chosen value. This enables a user to tune the intelligibility of the speech to his own 
liking. If the traosfonnadon does not model the Lombard effect perfectly, or if the noise is 
not characterized perfectly, or if the user just wants a partial, suboptimal speech intelligibility 
10 improvement, the user can set the noise level value to such a value that tiie speech 
intelligibility is iihpfovedthe way he likes it. 

It is also advantageous if a signal type charactraizing means is present, for 
delivering a signal type characterization value to the processor, for enabling the processor to 
perform a transformation of the entered audio signal depending on the signal type 
1 5 characterization value. E.g. the transformation is apphed only when the signal type 

characterization value indicates that speech is present in the entered audio signal. Or the 
transformation is not appHed when the signal type characterization value indicates e.g. that 
classical music is present, irrespective of whether speech is present simultaneously with the 
classical music. The signal type characterization value can be retrieved from additional data 
20 present m a received signal, e.g. the program type mformation in the Radio Data System 
(RDS). Furthermore the entered audio signal can be analyzed to determine whether it 
contains e.g. speech or music, what is mdicated by the signal type characterization value. 

One of the aspects of the Lombard effect is that the spectral contour of the 
entered audio signal is changed based upon the noise level value. E.g. the energy in a 
25 formant, or steepness of a formant, can be changed. Also the width of a formant, or tiie 
frequency of a formant can be changed. Alternatively a non-linear transformation can be 
applied to the frequency axis of the spectrum yielding a new spectrum. 

Another aspect of the Lombard effect is that the word length is changed based 
upon the noise level value. E.g. a transformation which keeps the length of a piece of the 
30 entered audio signal fixed, can shorten the silent periods between words to increase the 
duration of voiced pieces, which corresponds to the slower reproduction of words. 

Furftiermore the pitch or volume of the entered audio signal can be changed 
based upon thelaoise level value. ~ 
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More aspects of the Lombard effect are described in the literature, e.g. in "J.C. 
Junqua: The Lombard reflex and its role on human listeners and automatic speech 
recognizers. Journal of the Acoustic Society of America, vol. 93, no. 1, Jan. 1993, pp. 510- 
524.'' 

5 Instead of using a single noise level value characterizing the loudness of the 

noise, other values can characterize the noise more completely, e.g. the other values can 
characterize the frequency distribution of the noise. 

The second object of the invention is realized in that a television receiver is 
equipped with one of the embodiments of the audio signal processing apparatus described 

10 above, to improve the intelligibility of speech present in an audio signal, which is extracted 
by the television receiver from the television signal. The intelligibility of speech in a 
television program is often not good enough to enable people with less acute hearing, e.g. the 
elderly, to follow the television program well. 

The third object of the invention is realized in that a radio program receiver is 

15 equipped with one of the embodiments of the audio signal processing apparatus described 
above, to improve the intelligibility of speech present in an audio signal, which is extracted 
by the radio program receiver from the radio program. E.g. when a telephone conversation is 
broadcasted during the radio program, the person on the other end of the telephone line is 
often hardly xmderstandable. 

20 The fourth object of the invention is realized in that the method obtains a noise 

level value, indicating the extent of noise influencing the intelligibility of a reproduction of 
the outgoing audio signal, and transforms the entered audio signal into the outgoing audio 
signal by a transformation modelir^ at least one aspect of the Lombard effect not being audio 
signal volume control, based upon the noise level value. 

25 These and other aspects of the audio signal processing apparatus, the 

television receiver, the radio program receiver and the method of the invention will be 
apparent from and elucidated with reference to the implementations and embodiments 
described hereinafter, and with reference to the accompanying drawings, which serve merely 
as a non limiting illustration of some of the aspects or embodiments of the audio signal 

30 processing apparatus, the television receiver, the radio program receiver and the method 
according to the invention. 



In the dmwings: 
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Fig. 1 is a generic form of the audio signal processing apparatus. 
Fig. 2 is a specific embodiment containing more features. 
Fig. 3 is an example of a Lombard effect transformation. 
Fig 4 is a telg^dsion receiver containing Ifae audio signal processing ap paratus. 



5 Fig, 5 is a radio program receiver containing the audio signal processing 

apparatus, and 

Fig. 6 shows schematically a Synchronized Overlap and Add synthesis. 
In these Figures elements witii the same number in different Figures serve the 
same function, and elements drawn dashed are optional depending on the desired 
10 embodiment. 



The audio signal processing apparatus 1 of Fig. 1 comprises an audio input 3 
for obtaining an entered audio signal, an audio output 5 for oulputting an outgoing audio 
1 5 signal. A processor 9 performs a transformation 2 to improve the intelligibility of speech 
present in the entered audio signal, modeling at least one aspect of the Lombard effect The 
transformation 2 changes at least one characteristic of the entered audio signal based upon a 
noise level value 7 which is available to the processor. In specific embodiments this noise 
• level value 7 can be measured e.g. from the environment of the audio signal processing 
20 apparatus, in which case the processor 9 tries to improve the decreased intelligibility of a 
reproduction of the outgoing audio signal, due to environmental noise entering the ear of a 
listener. The outgoing audio signal may be reproduced by a loudspeaker 60. 

Fig. 2 shows a more advanced embodiment of the audio signal processing 
apparatus 1, containing more features. In a first noise level value 7 generation possibility, 
25 noise in the environment is picked up by means of a microphone 1 1 . Apart from truly 

external noises in the environment, the microphone also picks up an audio signal component 
generated by the reproduction of the outgoing audio signal by the loudspeaker 60, connected 
to the audio signal processing apparatus 1. The audio signal component generated by the 
reproduction of the outgoing audio signal by the loudspeaker 60 in a preferred embodiment is 
30 first subtracted from the signal coining from the microphone 1 1, or else the noise value 

summarizer 102 delivers an incorrect noise level value 7, summarizing the extent of the noise 
in the environment, to the processor 9. An approximation of the audio signal component 

g enerated by the reproduction of tBeomggi&FatMttrstga^iay^ 

travelinf^ thTou^h a room, is subtracted from the signal com ing from th e microphone by 
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meaiis of an adaptive echo cancellation filter 101, The coefficients of this adaptive echo 
cancellation filter 101 model the transmission of the reproduction of the outgoing audio 
signal through the room, fsx>m the loudspeaker 60 to the microphone 1 1 . The filter has as an 
input an outgoing signal feedback 104 from the outgoing audio signal- If the adaptive echo 
5 cancellation filter 101 is a digital linear filter, an optimal approximation of the audio signal 
component generated by the reproduction of the outgoing audio signal by. the loudspeaker 60 
is obtained by minimizing the error e(k) in : 

e(k) = M(k)-^(k) - r(k)-^r(k) + n(k) [1] 
In this formula k is a sampling time instance, M(k) the sampled value of the 
10 signal coming from the microphone at sampling tune instance k, ^(k) is an estimate by the 
adaptive filter of a sample r(k) of the audio signal component generated by the reproduction 
of the outgoing audio signal by the loudspeaker 60, and n(k) is a sample of the truly . 
environmental noise as picked up by tibie microphone, which is desired by the noise value 
summarizer 102 for generating the appropriate noise level value 7, The linear adaptive echo 
15 cancellation filter 101 generates its output signal ^(ki) from its input o(k), which is the 
sampled outgoing audio signal, e.g. by means of the following formula: 

Ar(fc) = f;w/A:Mifc-i7) [2] 

The estimation of the filter coefficients w^ik) by minimizing the error e(k) . 

can be done in a number of ways, e.gl by a least squares technique. More information can be 
20 obtained from the book "Simon Haykia: Adaptive filter theory- Prentice Hall 1986. ISBN 

013004052-5 025, pp. 307-348." As an alternative to incorpomtion of an adaptive echo 

cancellation filter 101, the reproduction of the outgoing audio signal by the loudspeaker 60 

can be interrupted during a certain time slice, or the outgoing audio can be reproduced softly, 

to improve the measurement of the truly extemal noises. 
25 The noise value summarizer can obtain the noise level value 7, e.g. by 

averaging the noise power over a number of samples L, followed by a nonlinear 

transformation f : 

y^fCtnCk)) [3], 

in which formula V is the noise level value 7. 
30 Since there are different possibilities for obtaining the noise level value 7, the 

noise level value 7 obtained from the environment is delivered to the processor as an 
environmental noise level value 21 . 
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In a second noise level value 7 generation possibUity, the noise present in the 
entered audio signal is characterized. This noise also degrades the intelligibiUty of speech in 
the outgoing audio signal. For this purpose a noise value characterizer 13 is included in an 
embodiment of the a udio si gnal processing apparatus 1. The noise value characterizer 13 can 
5 estimate the noise in the entered signal, e.g. by calculating Ihe signal power in frequency 
. bands outside the frequency range for speech. Another possibility is that thp noise value 
characterizer 13 uses the temporal characteristics of the entered audio signal. E.g. quieter 
time slices, in between time slices containing speech, only contain noise. Some of these 
features for distinguishing noise, voiced speech and other audio signal types are described in 
. 10 Uterature, e.g. the High Zero-Crossing Rate ratio or the specti^un flux, which can be used in 
different combinations to reliably differentiate between noise and speech. A number of 

features are described in *'L.Lu, H. Jiang, HJ.Zhang: A robust audio classification and 
. segmentation method. Proc. Int. Conf on Multimedia, 2001, Ottawa (Canada), pp. 203-211." 
Most of these features can be used both in the noise value characterizer 13 as well as in the 
15 signal type characterizdng means 17, for identifying v^diether speech is present in the entered 
audio signal. The noise value characterizer 13 delivers a signal noise level value 23 to the 
processor. 

In a third noise level value 7 generation possibility, a listener enters a noise 
level value 7 manually, to allow the transformation 2 to optimally improve the intelligibility 

20 of speech m the outgoing audio signal, according to the preference of the listener. This can be 
done e.g. by increasing or decreasing tiie current noise level value 7, by pushing one or more 
buttons on a remote control unit 105, \yhich sends a control mput signal to a selection mput 
15, from which a selected noise level value 25 is delivered to Ihe processor 9 by means of a 
noise value shipper 103, which strips the selected noise level value 25 from tiie contirol input 

25 signal. 

From the envkonmental noise level value 21, the signal noise level value 23 
and the selected noise level value 25, a smgle noise level value 7 can be graierated m a 
number of ways. E.g. the noise level value 7 can be set equal to the sum of the envkonmental 
noise level value 21 and the signal noise level value 23. Anotiier possibility is that the noise 
30 level value 7 is set equal to the selected noise level value 25. 

As further shown in Fig. 2, an embodiment of the audio signal processing 
apparatus 1 can contain a signal type characterizing means 17, which delivers a signal type 

characterization val ueTSTfothe processor 9. SiScrfaumans apgly the-lTXBlbiard-e^ttgrthgif- 

a peech under noisv conditions, applying the transformation 2 modeling aspects of t he 
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Lombard effect to the entered audio signal, is mainly interesting when the entered audio 
signal contains some speech. If the entered audio signal contains only e.g. music or other 
sounds, e«g. the soimd of an animal in a nature documentary, applying a speech intelligibility 
improving transformation is useless, and the transformation can even deteriorate the quality 
5 of the audio signal. Therefore it is interesting to include a signal type characterizing means 17 
which can indicate when speech is present in liie ent»ed audio signal, and if necessary also 
how much speech or what type of speech is present There are a ntmiber of alternatives for 
the signal type characterizing means 17 to obtain the signal type characterization value 18, 
Often textual service information is provided by the broadcaster together with the audio. This 

10 service information can indicate e.g. whether the audio corresponds to e.g. a jazz song or a 
news bulletin. Additionally the signal type characterizing means 17 can use algorithms for 
analyzuig the entered audio signal itself to estimate whether speechis present E.g. speech 
often has a more pronoimced modulation than music, which means that there are relatively 
silrat time slices in between loud, voiced time slices. Another example of speech / music 

1 5 discrimination is described in US-A-5,878,391 . In case there is only music present in the 
entered audio signal, e.g. a transformation can be applied which sets equalizer settings 
dependent on the type of music. 

Fig. 3 shows an example of a realization of the transformation 2 modeling 
some of the aspects of the Lombard effect. First the signal is processed by a pitch modifier 

20 51. Pitch is a psycho-aconstical property which is derived by a himian from a soxmd. There 
exist technical correlates for pitch however. Voiced speech production can be modeled as a 
train of Dirac impulses, representing an excitation by the vocal chords, which is filtered by a 
filter representing the resonances in the vocal tract, the glottal source spectrum, and the 
radiation load spectrum. Details can be found e.g. in "R. W. Shafer and L. R. Rabiner: 

25 System for automatic formant analysis of voiced speech. Journal of the Acoustical Society of 
America, vol. 47, no. 2, 1970, pp. 634-648." and "B.S. Atal and S.L. Hanauer: Speech 
analysis and synthesis by linear prediction of the speech wave. Journal of tiie Acoustical 
Society of America, vol. 50, no. 2, 1971, pp. 637-655.** The pitch of speech is determined by 
the period of the Dirac impulses. In practice liie first peak in the audio signal spectrum, or the 

30 autocorrelation of the audio signal can be used for determining a pitch of an audio signal. 
E.g. with the autocorrelation method, the pitch T is the time shift which maximizes the 
correlation: 

can r4i 



PHNL020527EPP 

10 18.06.2002 
where the in-product is typically calculated over a certain number of samples 
S of the audio signal i(k), and the small T in the exponent of iQn) denotes transposition. . 
Dep«iding on the noise level value 7 V, a new pitch T' is calculated, e.g. with the foUowmg 
_piecea!ri&eJine-aiLffinmda; ; _j 



5 r=a,VT + p, for N,^V<Nm [5], 

vi*iere the constants fi, are chosen so that the curve is continuous. 
Hence the more noise is measured, the higher the new pitch T' is. 
A new signal now has to be synthesized with the new pitch. A number of 
variants on the Synchronized Overly and Add (SOLA) technique can be used, e.g. Pitch 
10 Synchronous Overlap and Add (PSOLA) or Waveform Similarity based Overly and Add 
(WSOLA). These techniques exploit the fact jEhat in an audio signal there are long periodicity 
time slices, which contain a similar excitation waveform a number of times, e.g. 50 times. 
These excitation waveforms are generated by the vocal tract in response to the Dirac impulse 
excitations from the vocal chords. A slower phenomenon of change of the vocal tract, e.g. by 
15 opening the mouth, is reflected in the audio signal by the fact that after the e.g. 50 similar 
excitation waveforms, a new excitation waveform is repeated a nvimber of times. 

If e.g. it is desired to generate a new audio signal with the same pitch, but a 
shorter duration, only e.g. 40 of the 50 excitation waveforms are copied to the new audio 
signal. If a signal is required with the same duration, but a higher pitch, a greater numba: of 
20 excitation.waveforins.arecopiedintoatimeslice.ofthesame.durationQfthenewaudio 
signal, and the excitation waveforms are added where they overlap. 

This prmciple is illustrated schematically in Fig. 6, which shows an old audio 
signal 301 , which is converted to a new audio signal 303 of higher pitch. At a first synthesis 
time instance 3 07, a fijrst new waveform 3 1 1 of the new audio signal is constructed in the 
25 temporal envkonment of the first syntiiesis time instance 307. This first new waveform 311 
corresponds to a first old waveform 309 of the old audio signal 301 . The first analysis time 
instance 305 at which we perform excision of the first old waveform 309 is determined by tiie 
first synthesis time mstance 307 and the relationship between the old and the new pitch. The 
synthesis of the new audio signal 303 can be summarized in the following formula: 
2 w^Jt - + A,)x(fc - <r + A, + T-' OT)) 



la equation [6], the new audio signal 303 y(k) is synthesized at all discrete 
by-overlapratTrdiscrete-niffliber-of^yit&esis^teie4astaHees--^^ 
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positioned a temporal distance T apart- of waveforms excised fiom the old audio signal It 
is furOier assumed in equation [6] that both the excised and syntibtesized wavefonxis are 
weighted by tiie same window w. r"* (iT) is the analysis time instance corresponding to a 
synthesis time instance iT, where excision of a waveform from the old audio signal has to 
5 occur. However, when adding an excised waveform to a part of the new audio signal already 
synthesized, one has to be careful that an excised waveform from the old audio signal 
resembles closely an excitation waveform which is expected to follow the part of the new 
audio signal already synthesized. Therefore a small offset A, is introduced, which allows for 

excision of a waveform at a slightly different discrete time than r OT) , This is illustrated 
10 schematically in Fig. 6 by the fact that at both the third synthesis time instance 323 and the 

fourth synthesis time instance 327, the same excised third old waveform 325 is added to the. 

part of the new audio signal 303 already synthesized. 

More details of various SOLA techniques can be found e.g. in "W. Verhelst, 

D. Van Compemolle and P. Wambacq: A unified view on synchronized overlap-add methods . 
15 for prosodic modification of speech. Proceedings of the International Conference on Spoken * 

Language Processing. Beijing October 2002, pp. 63-66." Another example of audio signal 

pitch modification is given in US-A-5,479,564. 

Second, after pitch modification, the signal is processed by a formant enhancer - . 

53 . A formant is a resonance in the vocal tract, which can be modeled by a pole of a vocal 
20 tract modeling filter. The formant enhancer 53 achieves its goal e.g. by applying an 

Axitoregressive-moving-average (ARMA) filter to the audio signal leaving the pitch modifier 

51, which filter is designed to increase the heights of the formant peaks, while deeping the 

stretches of the spectrum in between the formants. This increases the steepness of the 

formants- The ARMA filter coefficients are based upon the noise level value 7. The more 
25 noise is measxrred, the more the formant heights are increased* 

Third, a word stretcher 55 increases tiie duration of words, by decreasing the 

duration of the silent time slices between words. E.g. a constant word stretch can be applied 

according to the following formula: 

w'=Cw when V>N [7], 
30 in which w is the duration of a word, C is a multiplication constant and N is a 

threshold which V, the noise level value 7, must exceed for word stretching to occur. Hence 

in the implementation of formula [7], if the measured noise level value 7 is high enough, the 

words are stretched by a predetermined percentage. 
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Fourth a signal amplifier 57 boosts the signal power in response to the noise 
level value, e.g. by means of the following formula: 

A = DV [8], 

. In which A is the amplification factor and D a constant. ■_ 

. 5 . After applying these transformations, the outgomg sound is more intelligible. 

It is possible that a user of the audio signal processing apparatus 1 activates 
only some of the described aspects, depraiding on what he thinks produces the most 

intelligible speech. 

Fig. 4 shows a television receiver 30, which contains the audio signal 
10 processing apparatus 1 for hnproving the mtelUgibility of speech present in the audio signal 
of the received television signal. A television signal enters the television receiver 30 throiigh 
a television signal input 203. A television baseband audio extraction unit 209 can, if 
necessary, tune to a desired television channel, demodulate and decompress the television 
signal, and separates the audio and service information present in the television signal from 
15 the video mformation. The television signal can come ficom a number of sources, e.g. a 

sateUite dish, a VCR, or Internet The audio output 5 sends the outgouag audio signal to a first 
loudspeaker 205 of the television receiver 30 or a loudspeaker externally connected to the 
television receiver 30. If a second loudspeaker is present, this second loudspeaker can receive 
the outgoing audio signal from the audio output 5, or from a second audio output, in vMch. 
20 case a different transformation 2 may be applied to the entered audio signal to obtain a 

second outgoing audio s^nal. The outgoing audio signal can also be sent to an audio signal 
recorder. The fact that only one audio signal path is shown does not imply that the 
transformation 2 can only be applied to mono audio signals, but rather the same type of 
transformation 2 can be applied to a selection of at least some of the channels present in 
25 multi-channel audio, e.g. coming from a DVD. 

Fig. 5 shows a radio program receiver 40 which contains the audio signal 
processing apparatus 1 for hnproving of speech present m the recdved audio signal. After 
entering a radio program mput 213, a radio baseband audio extraction unit 219 may extract a 
baseband radio signal from the radio program signal, by performmg if necessary a tunmg 
30 step, demodulation step, decompression step, etc. The outgomg audio signal is sent to a 
loudspeaker, e.g. the externally connected loudspeaker 21 1 . 

It should be noted that the above-mentioned embodiments illustrate rather than 

limit Wmve sntion and that those st anerinrttgmwabteno-design-dternativesrv^ 

departing from the scope of the claims. Apart from combmations of elements of the invention 
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as combined in the claims, other combinations of the elements within the scope of the 
invention as perceived by one skilled in the art are covered by the invention. Any 
combination of elements can be realized ina single dedicated element. Any reference sign 
between parentheses in the claim is not intended for limiting the claim. The word 
"comprising*' does not exclude the presence of elements or aspects not listed in a claim. The 
word "a'* or * W preceding an element does not exclude the presence of a plxjrality of such 
elements. The invention can be implemented by means of hardware or by means of software 
running on a computer. 
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CLAIMS: 
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1 . An audio signal processing apjparatus comprising an audio iiiput for obtaining 

an entered audio signal, an audio output for outputting an outgoing audio signal, and a 
processor for perfonning a transformation to improve the intelligibility of speech present in 
the entered audio signal, characterized in that the processor is arranged to obtain a noise level 
5 value indicating the extent of noise influencing the intelligibility of a reproduction of the 
outgoing audio signal, and has the. ability to transform the entered audio signal into the 
outgoing audio signal by the transformation modeling at least one aspect of the Lombard 
effect, not being audio signal volume control, based upon the noise level value. 

10 2. An audio signal processing apparatus as claimed in claim 1 , characterized in 

that a microphone and a noise value extractor are present for providing the noise level value 
from environmental noise to the processor. 

3. An audio signal processing apparatus as claimed in claim 1 or 2, characterized 
15 in that a noise value characterizer is jnresent, for retrieving the noise level value from the 

entered audio signal. " 

4. An audio signal processing apparatus as claimed in claim 1 or 3, characterized 
in that a selection input is present for setting the noise level value to a chosen value. 

20 

5. An audio signal processing ^paratus as claimed in claim 1 or 3, characterized 
in that a signal type characterizing means is present, for delivering a signal type 
characterization value to the processor, for enabling the processor to perform the 
transformation of the entered audio signal depending on the signal type characterization 

25 value. 



6. An audio signal processing apparatus as claimed in claim 1, characterized in 

that the transformation changes a spectral contour of the entered audio signal, based upon the 
noise level value. 
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7 An audio signal processing apparatus as claimed in claim 1 , characterized in 
that the transformation changes a word length of tihie entered audio signal, based upon the 
"noise-ievel-value; ' — ■ ^ ' 



A television receiver which is able to improve the intelligibility of speech 
present in an entered audio signal, characterized in that an audio signal processing apparatus 
is present, comprising an audio input for obtaining an entered audio signal, an audio output 
for outputting an outgoing audio signal, and a processor for traasforming the entered audio 
10 signal into the outgoing audio signal by a transformation modeling at least one change to an 
audio signal selected from aspects of the Lombard effect, based upon a noise level value 
available to the processor. 

9, A radio program receiver which is able to improve the intelligibility of speech 
1 5 present in an entered audio signal, characterized in tiiat an audio signal processing apparatus 

is present, comprising an audio input for inputting an entered audio signal, an audio output 
for outputting an outgoing audio sigoal, and a processor for transforming the entered audio 
signal into the outgoing audio signal by a transformation modeling at least one change to an 
audio signal selected from aspects of the Lombard effect, based upon a noise level value 
20 available to the processor. 

10. A method for inaceasing the intelligibility of speech in an audio signal 
comprising 

a first step of obtaining an entered audio signal; 
25 - a second step of transforming the entered audio signal into an outgoing audio 

signal; and 

a third step of outputtmg the outgoing audio signal, 
characterized in that the method obtains a noise level value, indicating the extent of noise 
influencing the intelligibility of a reproduction of ttie outgoing audio signal, and transforms 
30 the entered audio signal into the outgoing audio signal by a transformation modeling at least 
one aspect of the Lombard effect, not being audio signal volume control, based upon the 
noise level value. 
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ABSTRACT: 

EPO - DQ 1 

1 9. 06. 2002 

An audio signal processing apparatus (1) comprises an audio input (3) for an 
entered audio signal, an audio output (5) for outputting an outgoing audio signal, and a 
processor (9) for performing a transformation (2) to improve the intelligibility of speech 
present in the entered audio signal. The transfomiation (2) transforms the entered audio 
signal into the outgoing audio signal, by modeling at least one aspect of the Lombard effect, 
based upon a noise level value (7). The Lombard effect is a specific way in which people 
change their speech, when speaking in noisy environments. The audio signal processing 
apparatus can be applied in a television receiver and a radio program receiver. 



10 Figure 1. 
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FIG. 6 
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